Determining the Optimal Number of Clusters in Cluster Analysis
نویسنده
چکیده
Cluster analysis is the multivariate method which objective is to classify the objects. In current literature there are many methods and many distances measures, which can be mutually combined. There is no manual and rule which would clearly identify the appropriate combination method and distance measures during clustering. Simultaneously, in cluster analysis it is often necessary to determine the optimal number of clusters in to which the objects are to be classified. The aim of this paper is to illustrate the possibilities of the process of determining the number of clusters and to evaluate selected coefficients for determining the number of clusters in combination with clustering different methods and with different distance measures. For example CHF coefficient is more suitable to be used with combination with Mahalanobis distance, where the success is higher in comparison with Euclidean distance. For example using average linkage method the success is higher by 21.88%. On the other hand, coefficient D-B is more successful while using Euclidean distance measures. In the case of Ward’s method the success is higher by 15.63%.
منابع مشابه
خوشهبندی خودکار دادهها با بهرهگیری از الگوریتم رقابت استعماری بهبودیافته
Imperialist Competitive Algorithm (ICA) is considered as a prime meta-heuristic algorithm to find the general optimal solution in optimization problems. This paper presents a use of ICA for automatic clustering of huge unlabeled data sets. By using proper structure for each of the chromosomes and the ICA, at run time, the suggested method (ACICA) finds the optimum number of clusters while optim...
متن کاملDetermining the Best K for Clustering Transactional Datasets: A Coverage Density-based Approach
The problem of determining the optimal number of clusters is important but mysterious in cluster analysis. In this paper, we propose a novel method to find a set of candidate optimal number Ks of clusters in transactional datasets. Concretely, we propose Transactional-cluster-modes Dissimilarity based on the concept of coverage density as an intuitive transactional inter-cluster dissimilarity m...
متن کاملDetermining the Best K for Clustering Transactional Datasets: A Coverage Density-based Approach
The problem of determining the optimal number of clusters is important but mysterious in cluster analysis. In this paper, we propose a novel method to find a set of candidate optimal number Ks of clusters in transactional datasets. Concretely, we propose Transactional-cluster-modes Dissimilarity based on the concept of coverage density as an intuitive transactional inter-cluster dissimilarity m...
متن کاملDetermining the Best K for Clustering Transactional Datasets: A Coverage Density-based Approach
The problem of determining the optimal number of clusters is important but mysterious in cluster analysis. In this paper, we propose a novel method to find a set of candidate optimal number Ks of clusters in transactional datasets. Concretely, we propose Transactional-cluster-modes Dissimilarity based on the concept of coverage density as an intuitive transactional inter-cluster dissimilarity m...
متن کاملGraph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members
Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...
متن کاملOil Reservoirs Classification Using Fuzzy Clustering (RESEARCH NOTE)
Enhanced Oil Recovery (EOR) is a well-known method to increase oil production from oil reservoirs. Applying EOR to a new reservoir is a costly and time consuming process. Incorporating available knowledge of oil reservoirs in the EOR process eliminates these costs and saves operational time and work. This work presents a universal method to apply EOR to reservoirs based on the available data by...
متن کامل